AI023
Introduction to Triton Programming
Implementing Your First Kernel: Vector Addition
Learning Objectives
- Identify the core components of a CUDA kernel using the __global__ specifier
- Implement device memory allocation and data transfer between Host and Device
- Calculate global thread indices to map data elements to individual GPU threads
- Execute and synchronize a parallel kernel launch using grid and block configurations